Improving imbalanced classification using near-miss instances

نویسندگان

چکیده

The class imbalance is a major issue in classification, i.e., the sample size of rare (positive) often performance bottleneck. In real-world situations, however, “near-miss” positive instances, negative but nearly-positive are sometimes plentiful. For example, natural disasters such as floods rare, while there relatively plentiful near-miss cases where actual did not occur water level approached bank height. We show that even when true quite limited, disaster forecasting, accuracy can be improved by obtaining refined label-like side-information “positivity” (e.g., river) to distinguish from other negatives. Conventional cost-sensitive classification cannot utilize side-information, and small causes high estimation variance. Our approach line with learning using privileged information (LUPI), which exploits for training without predicting itself. theoretically prove our method reduces variance, provided instances plentiful, exchange additional bias. Results extensive experiments demonstrate tends outperform or compares favorably existing approaches.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering

 Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...

متن کامل

Evolutionary-based selection of generalized instances for imbalanced classification

In supervised classification, we often encounter many real world problems in which the data do not have an equitable distribution among the different classes of the problem. In such cases, we are dealing with the so-called imbalanced data sets. One of the most used techniques to deal with this problem consists of preprocessing the data previously to the learning process. This paper proposes a m...

متن کامل

Improving imbalanced scientific text classification using sampling strategies and dictionaries

Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. ...

متن کامل

Organizational Correctives for Improving Recognition of Near-miss Events

Despite decades of research on organizational disasters and their prevention, disasters remain all too common-place. Scholars across a wide range of disciplines agree that the one of the most viable approaches to reducing their occurrence is to observe near-misses-situations where a bad outcome could have occurred except for the fortunate intervention of chance--and use these events to identify...

متن کامل

Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification

A k nearest neighbor (kNN) classifier classifies a query instance to the most frequent class of its k nearest neighbors in the training instance space. For imbalanced class distribution, a query instance is often overwhelmed by majority class instances in its neighborhood and likely to be classified to the majority class. We propose to identify exemplar minority class training instances and gen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Expert Systems With Applications

سال: 2022

ISSN: ['1873-6793', '0957-4174']

DOI: https://doi.org/10.1016/j.eswa.2022.117130